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@ Checkpointing mechanism for fault-tolerant systems. 



© A checkpointing mechanism implemented in a data processing system comprising a dual processor 
configuration gives the system a fault tolerance capability while minimizing the complexity of both the software 
and the hardware. 

The active and backup processors are coupled asynchronously with some hardware assist functions 
comprising a memory change detector which captures the memory changes in the memory of the active 
processor and a mirroring control circuit which causes the memory changes when committed by establish 
recovery point signals generated by the active processor to be dumped into the memory of the back up 
processor so that the backup processor can resume the operations of the active processor from the last 
established recovery point 

The active and backup processors may each be connected to a dedicated memory and recovery point 
^ storing means, or to a memory including two dual sides shared by all the processors for storing data structures 
^ and recovery points. 
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- - - CHECKPOINTING MECHANISM FOR FAULT-TOLERANT SYSTEMS 

" ""Description, of the Invention ** * 

„ , „ - - 

■ ■ . \; . «. V ; i 

Reld of-the "Invention — * — ; 

* • * > ' 1 _ ' . . . ! 1 . / . 1 . * J, i 

5; The *'.p^^n,t ft^n1iorv.rela|esj;,to a checkpointing 1 mechanism for "providing ^information processing 
systems- with-fault t&terance capabilities. ' * ** " ' ; j 

V L' J :„D. L . -J *k ^ ,„ '^:_f'-: . ; 

Background; Art 7 Sjj L - ! ■■ ; ' **■ ; 

10 -Fault-tolerance is an efjnerging^quirement in the ^formation processing systems such^as processors 
o : r communication "controllers, ~*£hesje machines have to be designed so as to minimize the failure rate, 
n rimprove. the!failure'diagnostic^>apd localizatlonVso as t<p minimize the repair times. However, hardware and 
. softwarfe failures leekfl mostof-the time to a machine disruption. - ^ K ' ; 

): The availability controllers become more and 
75 : -more drastic since these -machtne&*s^ou Id ^provide their intended services, t as viewed by the user twenty 
' four | hours a^day, without atty^jntern^tionY As*th6r£ fe'no means to prevent the hardware failures, the 
'machmeTnu§fc^e~ designed so that the failures ^do nor disrupt the service. The so-designed machines are 

J: gatg td%e i faultloier'ant. ^~o* wi-Ia^A ."ZlllV. -.J! i * . - .... 

Su£h Machines aF^o^exisTTTVe^ on two different concepts, . 
20. f !l^]M^injis.-b^s^d' on .the firstf concept make use of tightly, coupled units. which synchronously execute the 
: v ■ ^ame progra m rrlns fruetiofi^ -Examples of such machines*, ar£. described in US Patent 4,654.857 and 
European -patent -applieatien%286wg56;; f t ^jj :h v vo: z^- \ > *- - - ^ 

" A major" drawback of this type^of -machines results frorrfthe fact mat a* perfect synchronism has to be 
maintained between the duplicated units. In addition, an instantaneous detection of the faults is required. 
25 On the contrary, the machines based on the second concept make use of an active unit associated to a 
backup unit. The backup unit is dormant and replaces the active unit in case of failure. 

Examples of such machines are described in French Patent 2.261,568. It is also used in the Tandem 
NONSTOP systems. 

French Patent 2,261.568 describes a multiprocessor configuration wherein a faulty processor can be 
30 replaced by a backup processor. When the failure is detected, a control unit saves information from which :; - 
the other processor can execute the tasks of the failing processor. This system does not provide any means 
to place the backup processor in the state of the active processor before the failure detection. In addition, 
some failures can prevent the failing processor state from being saved. 

In the Tandem NONSTOP systems, the backup processor is provided with a copy of the data of the 
35 task executed by the active primary processor. Periodically, it receives a message indicative of the primary 
processor status. In order that the backup processor receives the copy of the data of the task executed by 
the primary processor, it is required that the task executed in the primary processor sorts the data it has 
handled and sends the sorted data to the backup processor. This sorting process is complex and results in 
an overhead which is unacceptable in a real time system such as a communication controller. 

40 

Objects of the Invention 

An object of the invention is to provide an improved checkpointing mechanism for a fault tolerant 
system comprised of active units and backup units, which does not degrade the performance of the active 
45 unit. 

Another object of the invention is to provide such a mechanism which is transparent for the tasks 
executed in the active units. 

Another object of the invention is to provide such a mechanism which does not add to the complexity of 
the system software. 

50 

Summary of the Invention 

The checkpointing mechanism according to the subject invention allows the working process of an 
active processor to be resumed by a backup processor when the active processor is failing. This 
mechanism is associated with at least one pair of information processing units comprising a first and a 
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second information processing units. r Each information processing unit, includes a processor for running a 
program stored in a memory. This memory is attached to the processor through a memory bus comprising 
data, address and control lines, or it is comprised of a shared memory to which all of a plurality of 
processors are connected by an interconnection network. The processors can be set in an active, backup or 
.5 fail status under control of a configuration controller responsiv^ .to failure detecting means associated to 
each processor and detecting whether the , associated ^.processor is failing or not. The checkpointing 
• .. mechanism comprises: . . . " . ' ' . / 1 ; \ • 

firsts-memory ; change .detecting means , associated with af least the % information processing unit whose 
processor is initially set in the active status by the configuration controller /to receive the address and data 
. t w on . the memory bus .causing the. memory content to fee .changed and ^generate memory change records 
therefrom, ~ . . , , . 

. ''i . *ii ' .■ f ■ •• - - - . _ 

- nrst signalling means in, said information propessmg, unit whqse.prdcessor.i? .initially set-in the active status 
• by the. configuration controller, responsive, to 'a signal* .provided by. saicCprpcessor at selected points of the 
program to generate an establish recovery point sjgnal,^ 
• 75 first storing means associated with .at least the information, processing unit whose" processor is initially set in 
the backup status by the cpmmunication.-com to said first 

memory- change detecting means to store the, memoryVchange records ' received from said first memory 
change detecting means, ; - f \ J : /y 

first control means associated with said first storing, means and "responsive tqjhe establish recovery point 
20 signal received from the first -signalling means to cause a .separating record to, be stored 'in "the first storing 
means, and the.rnemor-y change records to- be read frorr? ,the first storing means and written in the memory 
of the information processing system whose .processor is ; Initially set in the backup status, as long as 
separating records are stored > in the first storing me^s^^ set in an active status, the backup 

* processor, can resume the working, process of. tractive status processor wh.en its status is switched from 
. 25. the active status to the fail status. r .,\, . ..7" 

- ^.Brief Description of the Figures > . r , / V . ' 

/ . Figure 1 represents the bJock r diagram of checkpointing mechanism according to^ the subject invention 

3Q wh^n- irnpjernented in a system comprising two ^ units. V 

Rgure 2 represents a memory change detector circuit (28-1 of Rgure 1). 1 
■ Figure 3 represents a mirroring control circuit (30?2 of Rgure 1). 

• Figures 4 and 5 represent the state, diagrams of the finite state machine in the mirroring control circuit y - 
Figure 6 represents the state diagram of the configuration controller. 
35 - Figure 7 -represents the checkpointing mechanism , when implemented in a multiprocessor system 
having processor pairs connected to one another through an, interconnection network. 

Figure 8 represents, another, implementation qf, the checkpointing mechanism, wherein several single 
processors are interconnected through shared intelligent memory^ 

40 Detailed Description of the Invention / " -\ "' 

For a better understanding of the invention, the checkpointing mechanism will first be completely 
described in relation to figures t to 6, in a configuration comprising two processing units backing-up each 
other, and each of which includes ;! a processor and a dedicated memory. Then, the invention will be 
45 generalized and applied to a plurality of processor pairs connected through an interconnection network 
(figure 7) and further, to a plurality of processors sharing a common intelligent memory (figure 8). 

The mechanism according to the subject invention allows fault tolerance to be implemented in a 
multiprocessor system comprising multiple interconnected and identical processors. It makes use of a 
duplex redundancy scheme, i.e. the system comprises processor pairs. In each pair one processor is active 
so i.e it effectively performs the processing work while the other one called the backup processor is dormant 
and activated only in case its companion processor fails. It is assumed that the switching is dynamic. 

According to the invention, a program roll-back technique is used which allows the system to recover 
from a failure in an active processor by restarting the processing work in a backup processor at a 
previously reached point of the processing work. Recovery points are distributed along the processing path 
55 and the state of the processor at these recovery -points is saved. The state of the processor comprises the 
set of all the variables present in the processor memory, registers or even in some part of the hardware, 
which affect the future processor behavior. 

The state of an processor is saved in a memory device whose failures are not correlated with the 

4 

BNSDOCID: <EP__0441087A1J_> 




EP 0 441 087 A1 

failures of the active processor. The state cah thus be retrieved by to backup processor. The processing is 
therefore sliced into successive computing stages bounded by two successive recovery points. 
/ The processors have a failstop design which means that a failure causes a processor to stop working 

and suspend any external actions. " * * „ " \ 

s In such ah environment the main problem to be solved consists in the establishment of synchronization 
' points between'the 'active and badkup processors* ' *' * 1 ' 

Also, the detection of the failures and the taking over by the backup processor have to be performed. A 
nurnber of conditions should be met to ensure 'that the active processor is'ihdeed undperational allowing the 
backup processor to resume the operations^ ; / ' ' " " 1 
io Figure f represents the mechanism of the present invention when implemented in a system comprising 
only one pair of data processing units 10-1 and 10-2. 

* Each unit comprises a processor 12 working under control' of a : contrbl program stored in a memory 14. 
1 ' The processors'and mefhories are referenced ^2-1 3 arid 14-1 in'unit 10-1 and 12-2 and' 14-2 in unit 10-2. 
The same control program is loaded intb'memories l^i and 14-2. • ~' c 
* *s ..* - The memories li-1 arid T4-2^are attached to processors 12-1 and 12-2 through memory busses 16-1 
and 16-2 respectively, comprising address^ ciata and control wires as is conventional. r 
" * Failure' detector ddvices*1 8-1 and" 18-2 which may be ; 'of any known type- are-' arranged to detect the 

processor failures and generate an active signal on FAIL 1 line 20-1 or FAIL 2 line 20-2 when they detect a 
" 'failure in processor 12-1 of 12-2, reispectively. ' 
'20 ' in Figure 1^ the failure detector 1 

' explanation. Jh fact, these failure "detector circuits comprise a plurality of checkers such as parity checkers, 
: r J , ''Rower "failure detectors, etc.. located art iseiecteb points inside the processors. 

' L ^ Each prdbessoV can be ih "any status called * ACTIVE, BACKUP or FAIL status, as^assigned by a 

* configuration controller 22r which determines the "processor status from the status of the FAIL signals it 
25 receives from the lines 20-1 and 20-2. In response to these signals, the configuration controller generates 
status control signals on busses 24-1 and 24-2 which are provided to status handler circuits 26-1 and 26-2 
in the systems 10-1 and 10-2 respectively, which causes said systems to be set in "a given status by means 
of processor control signals on busses 27-1 and 27-2. 
3 *' 1 ' r} The' detected failures which are Veportecf to the configuration controller 22 order a switch over of the 
30 status of the failing processor from ACTIVE tb v FAIL -status and that of its com pan ran*' from BACKUP to 
ACTIVE Status. ' ? : ? ; , 

Depending, upon the status of the FA1L-V and FAiL-2 lines 2CM and 20-2, configuration controller 22 
u activates one line of buisses 24-1 and 24-2/ An : active signal on line 24-1 A 'or 24^2A is intended to_set 
processor 12-1 or 12-2 in the ACTIVE status. An active signal on lines 24-1 B or 24-2B is intended to set 
* 35 processor^ 2- i or 12-2 in the BACKUP status. Active signals on lines 24-1F and/or'24-2F are intended to 
set processor^ 12-1 and/or 12-2 in the FAltstatus. r ■' - : " *■*• J ' 

,; The mechahism^kccording to the present' invention comprises memory change detectors 28-1 and 28-2 
and mirroring control circuits" 30-1 and 30-2* in the units -10-1 and '10-2 respectively: It also comprises a 
mirror bus 34 which is the dedicated path between units 10-1 and 10-2 for transferring the state data 
40 between the active processor and its backup processor as 'will be explained later on. 

The operation of the checkpointing mechanism will be described assuming that unit 10-1 is the active 
\ * unit arid unit 10-2 is the backup unit, the coritro! signaf from line 24-1 A which sets processor 12-1 in its 
" ' '* ' ACTIVE status is provided to memory change detector 28-1 and the control signal from line 24-1 B which 
1 sets "processor 12-1 to its backup status is provided -to mirrorihg* control circuit 30-1 to cause the memory 

45 change detector- 28-1 to be activated and mirroring control circuit 30-1 to be inactive, when unit 10-1 is 
active. : . ■ ■ ■ 

Conversely, in'unit 10-2, the memory change detector 28-2 is inactive and the mirroring control circuit 
30-2 is activated, by means of the status control signals on lines 24-2B and 24-2A. 

The state data are set up in unit 10-1 by the memory change detector 28-1 which is responsive to the 
so write signal on bus 16-1 to generate* a record of the memory changes comprising at least the memory 
address and data present on bus 16-1 when a Write operation is performed. 

These records are provided through STATE DATA wires 40-1 of mirror bus 34 to the write ahead queue 
WAQ memory 32-2 in the backup unit 10-2, where they are accumulated. The active program running in 
processor 12-1 is not involved in this process. 
55 The memory changes constitute the difference between the active and back up memory states. From 
now on, it is assumed that the processor state is equal to the memory state or in other words that the 
variables which determine the processor behavior from a point to later on the future must reside in memory 
exclusively. 
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It results that the backup and active processors can be synchronized by applying the active memory 
changes to the backup memory. 

At appropriate points, of the working process performed, by the program stored in memory 14-1, for 
example at the end -of each task, the active program issues an ESTABLISH RECOVERY POINT ERP 
s instruction. This instruction Js not- a new processor instruction. This may be.for example a READ memory 
instruction specifying-a memory address putside^the, memory address rang§.or ; a memory -address specially 
dedicated to the ERP instruction. The addrress will r be referenced -.as ERP. address. This, address is detected 
by detector 28-1 which 'activates ERP line 42-1 in response thereto, r ; , . ,. t 
, The active-FRP signal, on: line 42-1 -is provided to mirroring ^control circuit ,30-2, -which generates a 
io SEPARATOR record which,- is : written into*the ; write ahead, queue ; WAQ 32-2Jhrough bus 43-2 . Also, the 
mirroring control circuit 30-2 activates. the, read control line; 44-2, to cause, all records in the queue up to the 
. * most recently reached recovery point to be dumped , iato the memory 1.4-2 of the backup unit 10-2, as will 
; be described later on.,,- - \ . - ^ <■.-; . r .- ; ^ ■: ; - ■ 

During each computing stage, before the Establish Recovery^ Point, instruction is issued, when the 
75- control program of the active processor determines the address of the entry point of the next computing 
-stage, it issues a STORE instruction which, causes v tfce.; ; . entry point -of. the next computing stage (i.e. the 
address of the first instruction of the, next comporting, stage) .to be .stored at a fixed memory address 
: v referenced as Next Entry Point-address.- . - ; ( \, ; s ; 

This STORE operation will be detected . as a memory; -change. by memory change detector 28-1 and the 
20 corresponding memory change record is.queued in the write ahead^queue WAQ 32-2 to-be loaded into the 
memory. 14-2 of the backup unit .1 0-2. V t T ,— >,■-:■-;• _ 

If the program. reaches a recovery point, a : computi t ng stage .has been successfully executed by the 
active processor 12-1. The* processor state is updated jn the t backup memory 14-2. If the. -active processor 
12-1 fails before the program reaches, a recoypry. point, jhe backup processor. state is ; not .updated and the 
25 value setrup at the most recently reached recovery point is kept in the memory 14-2 at the Next Entry Point 
address. . • . ■.. . : , - - -■ , . -. VL , 

- : When the status of the unit . 10-2 is pitched from backup to active under control of the configuration 
controller 22 and status handler 26-2, the processor program resident in memory } f 4-2 waits for the 
completion of the current dump operation into, memory 14-2, if any, and starts executing the program at the 
30 instruction address which is read from the fixed Next Entry Point memory address. . 
The mirror bus 34 links the two units 10-1. and 10-2. . . 

.It is used exclusively to allow the active, processor to mirror its memory changes into the backup 
processor memory. Any kind ; of bus may be used provided that it presents the following capabilities: ~~ ; " 
1- It must be independent fromthe functional dat£ paths of the units 10-1 and 10-2. . 
35 2- it must -be directional . with the direction under control, of the statuses of the two processors. This 
capability has been schematically represented in Figure 1 by the provision of STATE DATA wires 40-1 
and ERP line 42-1 from memory change detector 23-1 to wrjte ahead queue .WAQ 32-2 and mirroring 
control circuit 30-2 and of STATE DATA wires 40-2 and ERP line 42-2 from, memory change detector 28- 
2 to write ahead queue WAQ ..32- Hand .Tikroring- control circuit 30-1, . Only, the wires, from the active 
40 processor to the backup processor are 1 active. The other jlirection is inhibited fgr^any other combination 
of statuses. Thus, the failing processor does not interfere with the active processor. 
This function is shown schematically in Figure 1 by the bus drive boxes 46-1 and 46-2 which are 
responsive to signals on lines 24-1 A. and 24-2A to control the direction of the transfers, on the mirror bus 34, 
depending which processor 12-1 or 12-2 is active. 
45 A memory change detector circuit 28 which may be used in units 10-1 and 10-2 is shown in Figure 2. It 
is assumed that this circuit is the one located in the active unit 10-1, thus the suffix 1 is added to the 
reference numbers. 

The memory bus 16-1 comprises address lines 50-1, data lines 52-1, byte select lines BS 54-1 and 
read/write R/W control line 56-1. It is assumed that the address lines are able to carry three bytes, the data 

so lines are able to carry four bytes and that the memory is provided with a byte select capability allowing to 
update only selected bytes in the memory addressed location. • 

The address, data, byte select and read/write control lines of bus 16-1 are provided to a memory 
change record generator 58-1 which is active when the processor 12-1 is active. This record generator is 
responsive to the signal on R/W control line 56-1 indicative, of a write memory operation to gate the address 

55 and data bits present on the address and data lines 50-1 and 52-1 into register 60-1 and generates ; a control 
field which is also provided to register 60-1. The control field contains information identifying this record as 
a memory change record and information derived from the byte select signal, specifying the bytes to be 
changed in the memory addressed location. 
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The memory change record generated in register 60-1 is provided through bus drive box 46-1 to be 
written into the write ahead queue 32-2. 

* An ERP detector 62-1 which is responsive to the signal on the R/W control line indicative of a road 
operation, and 'to the memory address in bus 50-1 being -equahto the ERP address : (meaning that the 
5 program issues the establish recovery point instruction)~activate : ttie'ERP^Iine 42-1 through the bus drive 46- 
'- '1. This comrhrts 'the^memory change-records stored up to that pdiht irt the^ write ahead queue 32-2. 
The-memitfiV' 1 ^ * ' r 1 --. 

A write ahead queue 32 and- a mirroring" cbntrol circuit 30 are-shownln Tfgure 3: It'is'assumed that they 
are locafed ,v in the 'backup Unit 1*0-2, thus a- suffix r number 2 rs'addedfo the reference numbers in Figure 3. 
io *' the writd 'ahe^l' queue WAQ is a du^l-port" memory With a" -write and read pori and a-first-in first-out 
" " f -RFO access. -It 1 is used to^tetripdrarily'queu^ the'memorsr- change' records received "from 'the bus 40-1. The 
f ■"' : ' queued' records can be dfequetfed, i.e read'and erased from the WAQ queue under control- of the mirroring 
control circuit 32-2 as will be described later on. Concurrent read and write accesses' are' authorized since 
-ItheWAQc^ " " 

■ v ' » T> 75 " r The' function^ the mirfdrmg'cSntdi cTrctiit -30^2 is^to 1 apply the fnemory changes accumulated in the 
^ WAQ ,! queue' 32-2 "tb ; the backup 'fhemory -1=4-2 ^ wl^n' these' cHariges- ffave been committee? : by an active 
'TEstabllsh Recovery point BRP r sigha^f ro^ EFPMine ^2-1 / *'< > - ' lf r ~ - 

The mirroring control circuit 30-2 comprises a finite- state machine. 70-2,- a counter 72-2, a register 74-2 
v * 1 ' wfiicfi* contains a separator pattern whicftlifas the saftie fdrmat'as a memory change record and is identified 
26 as sucfi by^ part^^ 1 * : 

The state diagrams of the finite state machine 70-2 are shown 1 in Figures^ and 5. When the signal from 
: line 24^2B' sets unit 10^* in fh"e^badl<u'p §tate? the finite state machine starts working. The first operation 
1 : * - (operation 90) 'cohsisfs 1 in 5 testingptfie ; ERP line 42-t. If it i§-<found?active (ON), finite state machine 70-2 
• y " activates 3ine78-Z/%hi6fr contained in Register 74-2 to be written into the write 

i . ; \Y.£g a hfead queufc 32-2 : (operatiori : 92J: : Then7^inite"state rHachine '70-2 activates increment line 80-2,^ which 

causes the counter 72-2 to be incremented by 1 (operation 94). . it; 

" The counter value^reflects the number of committed recovery points which ar& to be serviced by the 

* r * " ' " mirroring control circuit. ^ "rv-MC; i\ ^ ' , t* ~~< . ' . *.-. ' ~ - 

. Then, the ERP iirie r 42-1;is testect 'again and, "if it ts found inactive, the process is resumed at operation 
30 90. If not it is resumed at operati6rf : 96i-ln^ord&Kt0^ait-forahe drop of the ERP signal. > 

As long as there are committed memory, change records in the WAQ/ queue < 32-2, the finite state 
? — machine 72-2 generates a read control Signal' on line 44-2 and receives; -the ^records read from the WAQ 
queue through bus 82-2 fd translate them Into the appropriate address, data: and byte select information , 
which are provided to mefnory bus ; f 6-2; to -update -the backup memory? 1-4^2. The mirroring control circuit 
35 30-2 has a direct memory access -capability , which rrreians- that it can access the memory without assistance 

from the processor 12-2. r M ' : T >' • ~ ' 

' ; * ' The- state diagram of machine 70-2 < deseribing thefs6 operations is shown in Figure 5. 

First; the finite '-state- Machine - ' ?2-2 tests' the value in counter 72^2 (operation 100) and waits until this 
"•" «" value becomes* differenMrdnV 0i whkirYrn&anS that Committed 4he' memory change records generated during 
' ' *40 ' a computing stage have be&m accumulated. Whetfthe counter value is different from 0, read line 43-2 is 
activated and a record is read from the* WAQ queue (operation' 102). 
; Then, finite 'state machine 70-2 tests whether thisr Record is a memory change (operation 104). If yes, it 

: translates the record and writes the corresponding memor^ change into the memory 14-2 (operation 106), 
then operation 100 is resumed. - 
45 If the read record is not a memory change, the finite state machine tests whether this record is a 

separator record (operation 106). If yes, it activates line 84-2 which causes the content of counter 72-2 to be 
decremented (operation 108), since the memory change data corresponding to a computing stage have 
been dumped into the backup memory 14-2. * 
If no, an error signal is raised on line 86-2. 
so Though these operations 90 to 108 could be performed in the backup processor, they are preferably 
performed by a finite state machine, which is a hardwired logic circuit, to match the speed at which the 
memory changes may occur. 

When the status of the processor 12-2 is switched from backup to active, the processor issues an 
input/output instruction: READ RP COUNTER 72-2 to determine if all the committed memory changes 
55 queued in the WAQ queue have been serviced. A read of the RP counter 72-2 followed by a test for zero 
allows to do that operation. If the RP counter is not at 0, the processor waits for the monitoring control 
circuit 30-2 completes the copy of the memory changes in the memory 14-2. When counter is at 0, the 
processor 12-2 issues a READ instruction at memory Next Entry Point address to get the address of the 
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first instruction to be executed by processor 12-2 to resume the working process interrupted by the failure 
in processor 12-1. * " " 

The status handlers 26-1 and 26-2 in Figure 1 receive the status control signals from the configuration 
controller 22. { ' ' 

5 , They force interrupts to the processors in response to the switching order from the FAIL status to 
BACKUP statusjrom the BACKUP, to ACTIVE status or from the FAIL to ACTIVE status as indicated by the 
status control signals oh status control; busses 24-1 ^and 24^2. These interrupts notify the interrupt causes to 
the processor programs in " order that' the program be set idle when the processor is to be set in the 
BACKUP status or starts running when the processor isjto be set in the ACTIVE status. 

70 Configuration controller 22 also.comprises a 1 finite state machine which activates the line of busses 24-1 
and 24-2, as shown in Figure 6. ' - * * - 

There are five possible states for _ the processors 12-1; and 12-2, which are set by configuration 
controller 22. The states are shown in^boxbs 102, 104, L 106, 108 and 11k) in Figure 6. The table hereunder 
indicate the status of the lines 24-1 A; 24MB, 24-1 F, ;24-2A, 24-2B and 24-2F for setting the processors 12-1 

75 and 12-2 in the five states. j * 



so 


State 


Proc-; 
12-1 


Proc. 
12-2 : 


.2 4-1 A 


24- IB 


24-lF 


24-2A 


24-2B 


24-2F 




102 


Active 


Backup 


ON - T 


OFJF 


OFF r 


; OFF 


ON 


OFF 




104 


Fail 


Active 


OFF 


iOFF 


ON 


ON 


OFF 


OFF 


25 


106 


Backup 


Active"/ 


OFF 


;on 


OFF 


ON 


OFF 


OFF 




108 


iFail 


Fail. 


off; 


iOFF ; 


■ON" . 


OFF 


OFF 


ON 


30 


110 


Active 


Fail 


ON - 


-OFF 


" OFF- 


-J OFF 


OFF 


ON 



. The, events which cause the switching from one .state to another state are shown by the arrows- in ; * 
Figure 6. For example, when the processors are in state 102, which means.that processor 12-1 is the active 
35 processor and processor 12-2 is the backup, processor, the configuration controller sets the state 110 if 
processor 12-2 fails. - ■«■ = ■.. ^ . v 

The event which causes the switching from state 102 to srtate 1,10 is the switching of line 20-2 from the 
OFF to the ON status while line 20-1 is OFF. . ; - , r.-- . i . 

Alhthe possible events which cause the states .tQi.be switched are, shown below. 7 . .. 

40 



45 



50 



55 
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SWITCHING 


. EVENT. 








s £ - - • 


FROM 


TO 














.102, 


104 . 


. El =. 


.20-1 
20-2 


OFF. 
OFF] 


> 


ON 

■ - ■'. » i . 


-70 - * ' ''' " - 


',•.10 4, ' 


,106 - 


.,E2: = 


20-1. 
20-2 


ON 
OFF 




... , * 

off; 




: 104,. 


• 10 . Q ".- 




20 r 2 
,20-1 


OFF 
ON. ,. 


— . - >. 


ON 


75 


106 


104 


E4 = 


20-1 
20-2 


OFF 
OFF 


- - > 


ON 




' 106 


no 


E5 = 


20-2 
20-1 


OFF 
OFF 


— — > 


ON 


i 


110 


•102 ' 


E6 = 


20-2 
20-1 


ON 
OFF 

~i -. 


— — > 


OFF " 


i 
t 


, 102 


jllO 


|E7 = 

i '-■ 


20-2 
20-1' 


OFF 
~OFE 


> 


ON 


1 


.110 


108 


;E8 = 


20-1 
20-2 


OFF 
ON 


> 


ON ■ " i 

i 



30 



So as to allow any unit (10-1 or 10-2) to be initially set in active or backup status, a memory change 
detector, a mirroring control circuit- and a 'write ahead queue are associated to each unite This gives, 
flexibility to the system. ^ r - * : * 

35 Obviously, if this flexibility is^ndt desifed.M.e.Mf-aajnit is the normally activerunit and the other one 10-2 
is the normally back up unit, only one memory change detector 28-1 and one write ahead queue (32-2) and 
one mirroring control circuit 30-2- are required, v '•• 

In order to optimize the processor utilization, r the badkup processor . can be used for running another 
program stored in the- memory in- ah address space which' is? distinct from the address space dedicated to 
40 the backup function. 

As stated above, the checkpointing mechanism described in reference to Figures 1 to 6, can be 
implemented in a multiprocessor system as shown in Figure 7. Only two processor pairs 120 and 122 are 
represented in this Figure. As schematically shown in Figure 7, the processors, memories , checkpointing 
mechanism, fail detector circuits, status handlers and configuration controller of each pair are arranged as 
45 shown in Figure 1 . 

The processor pairs are interconnected through an interconnection network 1 24 which is assumed to be 
fault free. 

The processors communicate between them through messages exchanged via the interconnection 
network. The communication messages are generated by the active processor of the source pair to the 
so active processor of the destination pair. 

The messages are exchanged from the memory of the active source processor to the memory of the 
active destination processor via interconnection adapters making the interfaces between the memory 
busses and the interconnection network, these adapters are schematically shown as 126, 128, 130 and 132 
boxes. 

55 The backup processor in the destination pair does not directly receive the messages through the 
interconnection network but gets them through the checkpointing mechanism. 

The implementation of the checkpointing mechanism in this multiprocessor environment implies that the 
interconnection network has an addressing scheme which identifies the processor pairs and not the 
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individual processors so that the sender does not have to know which particular processor in the destination 
pair is active. 

. The processor to processor communication must be protected against the message loss or duplication 
resulting from a processor failure by an appropriate error;, recovery protocol. . " 
s The/ "Establish -Recovery Point" action, must not be issued by. the program^ while, a message in or 
. ; message out operation is^underway. A message in or message out operation ^an not therefore overlap 
' several computing- stages and a fail in .the active process^ 7/\._ 

- r if : a message put ,was performed during thejabqrtefl comprising - stage, , the ^message out operation will 
be reissued-by the new active processor based, upon the memoir state at the last reco 
10 ■ ■ ' - if ^message ia, was. performed, during the ; aboried. computing stage, the message in will be ignored 
. : . by the new back .up processor and wjll-be.r^ept by j^e sender . . ". " ' *' " ' . u 

As stated above, the check pointing mechanism described in reference to figures f to 6, can further be 
implemented in a multiprocessor, system as shown in . . , . . , - 

)fi _ Rgure 8 shows a plurality of processors 12^ X J&£ ^A2rr\ connected^ a shared, memory (140), 
75 which is of the type of the shared memory describ^i in , European of 
same applicant incorporated herein - by reference. TfieT processors, ar^ connected iaj^^ shared memory 
-tlirough.an interconnection network (142). The shared rnemory (.140) 'comprises Jwd^ A (144) and 

side B (146), wherein the same information are duplicated in order to provide a hardened storage, for data 
, - integrity. ; t w . . , %r ,= . . .......... 

20 As explained in, detail in the above mentioned patent application, to communicate with each other, the 
processors (12-1, 12-n) exchange messages using queues of records located in the shared memory, via 
appropriate high level commands. The high level commands n (PUT v GET, ENQ, DEjQ) sent by'the 
processors are built up by memory interfaces (156) connected to' said processor, and transmitted through 
the interconnection, .network-. (142) to a memory -command executor' (PMCE, not shown)-, integrated in the 
25 shared .memory, for executing the. high .level .commands. Said high lesvel commands work ,with data records 
identified by Logical Record Addresses (LRA) known by the processors. During execution, of the high level 
commands by the PMCE, the Logical Record Addresses are translated into physical addresses correspond- 
ing to physical address space in the sides of the shared memory. 

As shown in figure 8, any data structure, schematized by a duplicated record (148. 150) in the memory 
30 (140), is duplicated in order to support any single hardware fault in the storage or in the access system, in 
particular, a, recovery point established for a task, as defined in relation to the present invention, is always 
saved in the memory as a duplicatedj/ecord rather than ; an address space of a single memory dedicated to 
reach prQcessor r : as* described in relation to ; figures 1 to : 6. Thus, a recovery point, benefits from #ie J 
protection provided by the duality of the. memory (140) structure. r . ' ' 
35 Similarly, rather than affecting^ dedicated Work Ahead Queue (WAQ) to each processor, as described 
in relation to figures 1 to 6 V the implementation shown in .figure 8 provides for duplicated Write Ahead 
Queue records ; (1 52, 154) located in. the sides (144,^14,6) of the memory. (140)^ The remaining features of 
the checkpointing mechanism according to the .invention, . i.e. the fail detector, • status handler, and 
configuration controller functions, are dedicated to each processor (12-1, 12-4)," and are operated as 
40 previously described. ; ; . . < _i ^ , , , 

However, due to the use of a shared memory connected to all th? processors ;i the establishment of a 
recovery point located in the packet, memory is somewhat specific, because of the duplicated nature of the 
memory sides (144, 146). 

At the lowest level, a recovery point for a task is a set of data records (148, 150) located in dual 
45 address spaces. If the update of the recovery point were done in parallel in both sides (144, 146), the same 
false values would be stored in both sides; in case of a crash of a processor during a PUT operation. For 
instance, if a recovery point is defined by 8 bits, a crash during a PUT operation could leave the false value 
nnnnnOOO as a new recovery point, in both sides of the packet memory (where n represents a bit of the 
new recovery point value, and O represents a bit of the old recovery point value). 
so Thus, to remedy to this problem when the checkpointing mechanism is implemented using a shared 
memory, it would be necessary to de-synchronize the update of the two sides of the shared memory, in 
order to always have a consistent recovery point in at least one of the memory sides (144, 146). 

In an environment using the shared memory as explained above, the commit phase, consisting in 
saving the data states created between two recovery points, in a kind of cache, or commit list materialized 
55 by the Write Ahead Queues, comprises the steps of: 

1 . Saving the commit list in the Write Ahead Queues located in records of the shared memory (1 40). 

2. Updating the side A (144) of the shared memory (140) with the latest recovery point. 

3. Updating the side B (146) of the shared memory with the latest recovery point. 
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it is to be noted l that, as long as the latest recovery point is not saved in both sides of the shared 
memory, the execution of a task by a processor must not modify the data states outside this processor. 
Therefore, all the external actions generated by the task execution" are either saved in the commit list as 
previously explained, or executed immediately but logged into an Undo List saved in -the shared memory. 
s* ; thus, any external operation and its result can be annihilated Tn th6 occifrrence of a failure while the task is 
executmg-in the processor. Under such cireufristancete, the ' exbctrtioh of thd undo list to erase the work of 
the task would°be followed by'the steps ofgetting th#~pr^ from the pabket : memory, and 

restarting'- the 'task^wfth. 'We" ^ata' provided 5 by said previous ^dv^ry ^bint Mh Compliance with the 
checkpointing scheme described in relation H6-figures*f "to B: It Is to be' noteti4hatvcfue4o the dual saving of 
io the (Drevious fecovery^pbint ih'twb sides' of the shared* memory ;*tHe recovery procedure would slightly vary 
according to the moment of the occurrence of' a processor failure, respective 'to the ' steps of the commit 
phase/ - ' '-•/ - 1 4 1 / ' n?v '^' ' ' - ' * 

Thus, if the failure occurs durihg;the phases '1 or 2 as previ6us(y : defirred, the new recovery point is not 
; yet present in side A* of th# packet mfcrtiory, and therefore the recovery procedure' has to start from the old 
ii r recovery poinV and* will ^ von.^:.. ■ - . * * : v 

' a) coping th^bld recover^ ^oirit Stored in side Br intfcrsifte A. ' ■ 1 ' 

' b) executing the Ondo c List' t t6 *erasb' : th§ r! exfernal -sections already performed using the 1 hew recovery 

? point- ? ^ * y " rx hF:v : - : ' : -' ; ^ c * - 11 5 ' : ^ *"• 

And, if the failure occurs during phase 3 as previously defined, the new valid recovery point is already 
20 stored* 1 in side M A of the packet m^bfyrarid therefore tlrfe recovery r procedufe hasto start from- the new 

recovery point, and wifl perform th^steps of : c * ' - '"' ' 

J: * a) copying the new recovery point whicli is in side A, into -side B. - 

nr -bj execMhQ the; Commit - *• -.* ; " v 

i3:» u--- (i ^'|f.i l s*to-be*Votec!'that, since' re'eovery procecRTre is perforrried by a backup processor, if is mandatory 
25 ::: to save in thV'sftared merhdry, ah r ihfdrmalibh^ correspond the' phase in which the main processor was 

'%rrerVittail§tf - r - > +r> r> t. ^r.. < • 

Claims ..'W^f. V. V.V-. : .i; ;o • ; . ■ - ; ' X Z, 

3d ' 3 T. A checkpointing mebhanisnf r atldwfhg ? the 'Working process of an active * processor to be resumed by a 
backup processor when the active^ processor fe failing, said checkpointing' mechanism being associated 
with at least one' pair^of ihfornrratiori prbbess?ng units' comprising a first and a* second information 
- processing units (ICKVQand 10^2), each Information processing unit includi'rtg a -'processor (12-1, 12-2) A 
which runs a program stored irva merVidry (14-1 f ,14-2) attached to the-pfoce&sdr through a memory bus 
35 - (16-1, 16-2) comprising dataratlcfress^Rd 'control 1 f lines^ and whfch can" be set in an active, backup or fail 
status under control of a 'configuratibtf contrbller ^(22)- responsive to failure detecting means (18-1, 18-2) 
A 1 associated to each^pmcessor an'd detedtin'g whether' the assodiated processor is failing or not, the 
checkpointing mechanism beirtg 'characterized in tKatlt cbmprises: : 
t * . • ■ • i , - ^ * ' -Tc r. j ^1 .• .. ft .* . '.» • 

ao first memory change detecting means (28-1, 28-2) associated with at least the information processing 

- unit (12-1, 12-2)* whds^ processor is initially 'set rh 'the active*: status by the configuration controller to 
—receive the address -and data orPthe meniofy bus causing the memory content to be changed and 
generate memory change records therefrom, 

45 first signalling' means (28-1, 28-2, 62-1) in said information processing unit whose processor is initially 

: " set in the active status by the configuration controller, responsive to a. signal provided by said 
processor at 'selected points of the program to generate an establish recovery point signal, 

first storing means (32-1, 32-2) associated with at least the information processing unit whose processor 
so is initially set in the backup status by the communication controller, said first storing means being 
coupled -to said first memory change detecting means to store the memory change records received 
from said first memory change detecting means, 

first control means (30-1, 30-2) associated with said first storing means and responsive 1 to the establish 
55 recovery point signal received from the first signalling means to cause a separating record to be stored 

in the first storing means and the memory change records to be read from the first storing means and 
written in the memory of the information processing unit whose processor is initially set in the backup 
status, as long as separating records are stored in the first storing means, whereby when set in an 
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. active. .status, the backup processor can resume the, working process of the active status processor 

■ when its status is switched from-the active status to the fail, status. 

Checkpointing mechanism according to claim ,1 characterized in that it comprises: . 

second memory change, detecting means (28-1, 28-2) associated .with the. information processing unit 
whose processor is initially set in the backup status by the , configuration, controller and is able to 
receive the address and data on the memory bus causing the memory content to be changed and 
generate memory change records therefrom, . • . . . . f 

.-seccnd ; signalling means,. (28-1; -28^2, 62-1) in said inform^ion processing unit whose processor is 

■ initially set in ; the backup status by the configuration-controller, able to generate, an establish recovery 
. point signal, in response, to a signal provided by said .processor at ^elected points of the program, 

second storing means (32-1 s . 32-2). associated with the information,.prQcessing unit whose processor is 
initially^ set in s the active status by the comraunicatioij .controller, said, second storing means being 
. coupled Jo ; said ; second pnerpotv change ^etectmg^ to sjtore the memory change records 

received from said second memory change detecting means, 

second control means (3Q-1,30:2) associated .with, s^id second, storing means and responsive to the 
establish recovery pgint signal received from the secpnd.signalling means to cause a separating record 
to be stored in the second storing means -and: the m from the second 

storing means and written in the memory of the information processing unit whose processor is initially 
set in the active status, whereby any one of said processors in the first and second information 
processing units can be initially set in the active or backup status by the configuration controller. 

Checkpointing mechanism according to claim 2, characterized in that said memory (14-1, 14-2) and 
said first and second storing means (32-1, 32-2) are constituted by dual shared records (148, 150, 152; 
154) duplicated in two sides (144, 146) of a memory (140) shared by said active and backup 
processors. 

A checkpointing mechanism according to claim 2, characterized in that: the first or second memory 
change detecting means, and the first or second signalling means are set in operation when their 1 " 
associated processor is set in its active status by the configuration controller, and 

the first or second storing means and the first or second control means are set in operation when their 
associated processor is set in its backup status by the configuration controller. 

Checkpointing mechanism according to claim 1. characterized in that the first storing means comprise a 
dual-port first-in-first-out memory wherein the memory change records received from the first memory 
change detecting means are queued. 

Checkpointing mechanism according to claim 4 or 5 characterized in that 

the first storing means comprise a dual-port first-in-first-out memory wherein the memory change 
records received from the first memory change detecting means are queued, 

the second storing means comprise a dual-port first-in-first-out memory wherein the memory change 
records received from the second memory change detecting means are queued. 

Checkpointing mechanism according to claim 1 or 5, characterized in that the first control means 
comprises: 

counting means (72-2), 

logic circuit means (70-2) responsive to the establish recovery point signal to cause the counting 
means to be changed by a first fixed value ( + 1 ) from an initial value (0) and generate the separating 
record to be written into the first storing means, said logic circuit means being responsive to the value 
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of the -counter to cause the first storing means* to be -read if said value is different from the initial value 
and the read recorded be tested, and sent to the memory of the - information processing unit if this 
record is a memory change record or the counter value to be changed by a second fixed value (-1) 
which is the opposite of the first fixed value if the record is a separating record. 

Checkpointing' mechanism according* to"' claim 2, 4 or 6, : characterized in that -the first and second 
control means Comprises: '-*'*- : / 1 

counting means (72-2), : - * " * - • v * \*; * - 

logic circuit- means (70-2) responsive* to 1 the establish -recbvery point signal to cause the counting 
means to* be 'changed by a first fixed* value (+ 1) from an initial value (0) and^generate the separating 
-record tb 66 'written into the first 'stonng means; said logic' circuit means being responsive to the value 
of the counter to cause the associated storing means to be read if said value is different from the initial 
value and i the read 'record*' to-** W' tested; arid* sent to the memory of the 'associated information 
processing unftWf this record" is ; a* memory change record or the counter value to be changed by a 
second fixed valtte (-1) wtiiefi isfthe opposite of the first fixed value if the record is a separating record. 

Checkpointing mechanism according to any one of the claim 1 to 7 characterized in that the active 
processor sends the address" of the ! program instruction at which the backup processor will have to 
resume the working process, through T th"e- memory bus so that this address constitutes a memory 
change' record which } is r written^ihto 7 tfie u rhemory of the processor set in the backup status at a fixed 
address. 1 "' • ' :% ; ^ '* * : : ' 
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